A Latent Class Modeling Approach to Evaluate Behavioral Risk Factors and Health-Related Quality of Life

Introduction The Behavioral Risk Factor Surveillance System (BRFSS) monitors multiple health indicators related to 4 domains: risky behaviors, health conditions, health care access, and use of preventive services. When evaluating the effect of these indicators on health-related quality of life (HRQOL), conventional analytical methods focus only on individual risks and thus are not ideally suited for analyzing complex relationships among many health indicators. The objectives of this study were to 1) summarize and group multiple related health indicators within a health domain by using latent class modeling and 2) analyze how 24 health indicators in 4 health domains were associated with 2 HRQOL outcomes to identify Rhode Island adult populations at highest risk for poor HRQOL. Methods The 2008 Rhode Island BRFSS, a population-based, random-digit–dialed telephone survey, collected responses from 4,786 adults aged 18 years or older. We used latent class modeling to assign 24 health indicators to high-, intermediate-, and low-risk groups within 4 domains. The effects of all risks on HRQOL were then assessed with logistic regression modeling. Results The latent class model with 3 classes fitted the 4 domains best. Respondents with more health conditions and limited health care access were more likely to have frequent physical distress. Those with more health conditions, risky behaviors, and limited health care access were more likely to have frequent mental distress. Use of preventive health services did not affect risk for frequent physical or mental distress. Conclusion The latent class modeling approach can be applied to identifying high-risk subpopulations in Rhode Island for which interventions may have the most substantial effect on HRQOL.


Introduction
The 2 overarching goals of Healthy People 2010 are "to increase quality and years of healthy life" and "to eliminate health disparities" (1). Tracking population health-related quality of life (HRQOL) can monitor progress toward national health objectives (2) and helps identify health disparities. Besides asking HRQOL questions, populationbased health surveys such as the Behavioral Risk Factor Surveillance System (BRFSS) monitor multiple health domains, including risky behaviors, health conditions, health care access, and use of preventive services, which are all associated with HRQOL. Each of these domains has many indicators, most of which are highly correlated. In epidemiologic studies, adjusting for multiple, often correlated, confounders is necessary to draw valid inferences about associations. Because traditional analytical methods such as linear regression (3) or logistic regression (4)(5)(6)(7)(8)(9)(10)(11)(12) focus only on individual risks, they are not ideally suited to analyzing complex relationships involving multiple health domains. Indicators measure only special aspects of each domain; therefore, traditional methods must include several indicators to assess these domains comprehensively. However, including highly correlated items can lead to collinearity and problems with estimating and evaluating the association of these domains with outcomes such as HRQOL. This study shows how latent class modeling can summarize indicators within a health domain into latent classes and how logistic regression can then associate these latent classes with HRQOL. Specifically, we applied these methods to Rhode Island 2008 BRFSS data to analyze associations among 24 indicators in 4 health domains with 2 HRQOL outcomes to identify Rhode Island populations at highest risk for poor HRQOL.

Data source
BRFSS is a telephone survey administered in all 50 states and 4 US territories with funding and specifications from the Centers for Disease Control and Prevention (CDC). BRFSS monitors behavioral health risks that contribute to the leading causes of disease and death among adults aged 18 years or older. It also monitors access to health care and certain health conditions. Rhode Island has participated in BRFSS since 1984. From the 2008 Rhode Island BRFSS, a population-based, random-digit-dialed telephone survey of 4,786 respondents, we selected 24 dichotomous indicators of 4 health domains -risky behaviors, health conditions, health care access, and use of preventive services (13) -and 2 HRQOL outcomes. Although CDC's institutional review board (IRB) approved the survey administration, IRB approval was unnecessary for these secondary analyses.

Indicators
We assessed 5 risky behaviors: current smoking (smokes cigarettes daily or some days), binge drinking (≥5 drinks for men, ≥4 drinks for women on 1 occasion during the past 30 d), heavy drinking (≥3 drinks/d for men, ≥2 drinks/d for women), drinking and driving in the past 30 days, and not always wearing a seatbelt ( Figure 1).
Respondents reported ever having had a diagnosis of the following 8 health conditions: asthma (ever being told by a health professional that they had asthma and they have asthma now), diabetes, obesity (body mass index ≥30 kg/ m 2 , calculated from self-reported height and weight), disability (limited in any activities or use special equipment), loss of 6 or more teeth because of decay or gum disease, angina/coronary heart disease, heart attack/myocardial infarction, or stroke.
Respondents answered 5 questions about access to health care: being uninsured (no health care coverage), not seeing a physician because of cost (despite needing to, during the past year), having no regular health care provider, no routine checkup in the past year, or no dental visit in the past year.
We assessed use of preventive services with 6 questions: never had a pneumonia vaccination, had no influenza vaccination in the past 12 months, no prostate-specific antigen (PSA) test in the past 2 years (men ≥40 y), never had a sigmoidoscopy or colonoscopy (men and women ≥50 y), had no mammogram in past 2 years (women ≥40 y), or had no Papanicolaou (Pap) test in the past 3 years (women with an intact cervix).

Outcomes: 2 HRQOL items
The 2 HRQOL items asked respondents about their recent physically unhealthy days ("Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good?") and mentally unhealthy days ("Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good?"). These variables were dichotomized at 14 or more days (frequent physical or mental distress) and 0 to 13 days (no physical or mental distress). Others have used this cutpoint to identify frequent distress (5,10).
We included as potential confounders in the multivariable analyses age group (18-44 y, 45-64 y, ≥65 y), sex, race/ethnicity (non-Hispanic white, Hispanic, non-Hispanic other [which includes African Americans because they are rare in the adult Rhode Island population]), education level, and employment status (2,12,14). We excluded income from the models because 14% of respondents lacked income information; we instead used education level and employment status as indicators of socioeconomic status because both highly correlate with income.

Analyses
The latent class model collapses many discrete indicators into a few meaningful latent classes, categories that refer to levels of a domain, and estimates the probability that a particular person belongs to a specific latent class. The unit of analysis is the response pattern (15,16). In our study, the 5 indicators of risk behaviors have a total of 32 (2 5 ) possible response patterns (eg, current smoker vs not current smoker); the 8 indicators of health conditions, 256 (2 8 ) possible response patterns; the 5 indicators of access to health care, 32 (2 5 ) possible response patterns; and the 6 indicators of use of preventive services, 16 (2 4 ) (men) and 32 (2 5 ) (women) possible response patterns (17)(18)(19). For example, for the 5 indicators of risky behavior, latent class modeling can summarize the 32 possible response patterns into an interpretable number of latent classes (eg, 2-4).
The latent class model provides the prevalence of each latent class (marginal probabilities), and the class-specific response probabilities of each indicator (conditional probabilities) (14,(20)(21)(22)(23). Formulae and parameters for latent classes appear in the Appendix.
To determine the most parsimonious model, we sequentially fitted models from 1 to more latent classes (16,18,24), compared successive models using the Bayesian information criterion (BIC), and chose the model with the smallest BIC values (25,26) (Table 1).
To examine further the influence of high-risk classes on health, we performed 2 logistic regressions to assess the association of all these latent classes with the 2 outcomes, physical health and mental health. These models estimate odds ratios with 95% confidence intervals for frequent physical and mental distress of the higher risk classes relative to the lower risk classes, adjusting for age, sex, race/ethnicity, education, and employment status.
We used Latent Gold version 4.5 (Statistical Innovations, Inc, Belmont, Massachusetts) for latent class modeling because this software can accommodate the complex sampling design of BRFSS and can also handle missing data. We performed all other analyses with SAS version 9.1 (SAS Institute, Inc, Cary, North Carolina) and accounted for the complex sampling design of BRFSS. Significance was set at P < .05.

Results
The latent class model with 3 latent classes provided the best fit for all 4 health domains (Table 1). Nine percent of Rhode Island's adult population belonged to the highest risk class for the 5 risky behaviors ( Figure 2): 29% or more  Six percent of the adult population belonged to the highest risk class for the 8 health conditions (Figure 3): at least 15% in this class reported having asthma, diabetes, obesity, a disability, 6 or more teeth lost, angina or coronary heart disease, a myocardial infarction, or a stroke. In 18% of the population, 20% or more reported having asthma, diabetes, obesity, a disability, or 6 or more teeth lost, but less than 6% reported the cardiovascular conditions -angina, heart attack, or stroke. In the remaining threefourths of the population, 10% or less reported any of these conditions except obesity (16%).
Ten percent of Rhode Island's adult population belonged to the highest risk class for difficulty accessing the 5 kinds of health care services ( Figure 4): at least 45% of these reported having no health care coverage, not being able to see a physician because of cost, having no regular health care provider, having no health checkup in the past year, and having no dental visits in the past year. In 7% of the adult population, at least 78% said they had visited the dentist, had health coverage, and could afford to see a doctor, but at least half had not had a routine checkup in the past year or a regular provider. Less than 16% of the remaining 83% of the adult population reported difficulty accessing any of these kinds of health care.
Twenty-one percent of Rhode Island's adult population belonged to the highest risk class for not using the 6 kinds of preventive health care services ( Figure 5): at least 75% of these had never had a pneumonia vaccination and had not had an influenza vaccination in the past year; at least 77% of those aged 50 or older had never had a sigmoidoscopy or a colonoscopy; 99% of the men aged 40 or older had not had a PSA test in the past 2 years; and at least 48% of women had not had a Pap test in the past 3 years or a mammogram in the past 2 years for women aged 40 or older. In 52% of the population, more than 70% had not had a pneumonia vaccination or an influenza vaccination in the past year but at least 65% had used 1 or more of the other preventive services. In the remaining 27% of the population, at least 70% used all of the different kinds of preventive health care services.
Approximately 11% of Rhode Island adults reported frequent physical or mental distress. Frequent physical distress was more common in older adults, those with a high school education or less, the retired, the unemployed, and those unable to work ( Table 2). Frequent mental distress was more common in younger adults, women, those with some college or less education, homemakers, students, the unemployed, and those unable to work.
Respondents with more health conditions and limited health care access were more likely to have frequent physical distress (Table 3). Those with more risky behaviors, health conditions, and limited health care access  were more likely to have frequent mental distress. Use of preventive health services did not affect risk for frequent physical or mental distress.

Discussion
Our results are consistent with previous findings (8)(9)(10)(11)27) that indicators of individual health conditions, such as obesity (8,9) and asthma (27), or individual health care access, such as no health care coverage (11), are associated with frequent physical and mental distress. Individual risky behaviors such as binge drinking (10,11), are only related to frequent mental distress. Our methods allow us to go beyond these results by including many correlated items as indicators for 4 health domains in assessing 2 outcomes. Other authors have either enumerated patterns of adherence to specific health behaviors (28) or summed the number of risk behaviors or health conditions to provide a summary score for analysis (29). The latter approach has been criticized because it weighs risk behaviors or health conditions equally regardless of their severity (30). Latent class modeling is qualitatively more informative than a summary score. For instance, the highest-risk latent class of risky behaviors for Rhode Island adults differs most from lower-risk classes by their larger percentages who binge drink, drink heavily, or drink and drive rather than who do not always use seatbelts or currently smoke.
Combining latent class modeling with logistic regression provides 3 advantages in analyzing data from surveys like BRFSS. First, by grouping multiple correlated health indicators into a few health domains, latent class modeling allows these indicators to become associated with the outcomes through these domains and minimizes multicollinearity. To avoid this problem, many studies (8)(9)(10)(11)27) evaluate only some indicators that partially reflect only a specific aspect of each health domain, making their analyses subject to confounding from other excluded indicators. Second, combining latent class models and logistic regression models allows several indicators to contribute to each health domain, thus improving the reliability of these domains by reducing the variation in the health domain due to any single indicator. Finally, once the number of latent classes is selected, latent class models do not require specifying arbitrary cutoffs to distinguish these classes.
These methods analyze men and women together rather than separating them. When analyzed by sex, the patterns of the 4 health domains for men and women were similar. Although some of the indicators in the "preventive services" domain are sex-specific, the Latent Gold software identifies these indicators as structurally missing and treats them as missing at random and totally dependent on the covariates (31). Although men generally use preventive services less often than women, the latent class patterns for the 3 preventive service indicators that are not sexspecific are similar among men and women. Finally, only about 1,690 men and 3,096 women responded to the 2008 Rhode Island BRFSS, making it difficult to carry out sexspecific analyses with so many indicators while avoiding artifacts.
We summarized 24 indicators in 4 health domains to predict 2 outcomes. Some may object that we did not include some important indicators, that some health behaviors were misclassified as health conditions, or that the health domains provide little explanatory power beyond a summary score of risky behaviors and health conditions. BRFSS does not ask the same questions every year. By choosing 2008, we included questions on mammography screening, Pap tests, PSA tests, colorectal screening, oral health, and drinking and driving in the past 30 days. We also excluded questions asked only in odd years about physical activity, eating fruits and vegetables, the burden of arthritis, and gastrointestinal diseases. We excluded questions about HIV and AIDS because very few respondents reported either condition. We also excluded 2 questions on emotional support and life satisfaction that appeared to form a fifth latent health domain because this domain compli- cated interpretation while not adding much to the 4 existing domains. We included obesity as an indicator for a health condition rather than for a health behavior because we view obesity as an outcome of other behaviors (ie, inadequate diet and physical inactivity) (28).
Besides not being able to include some health indicators, this study has several limitations. BRFSS is cross-sectional, making it difficult to determine whether the risky health behaviors, health conditions, and access to health care increased risk for frequent physical and mental distress or whether this distress increased risk for these behaviors, conditions, and health care access. Because BRFSS respondents reported information, not being able to validate this information and possible recall bias may have affected the observed associations. Latent class modeling has its own limitations that require interpretation. Each class should have a substantial number of observations because small classes may be artifactual outliers. a Models were sequentially fitted from 1 to more latent classes (16,18,24) and compared to successive models using BIC. The smallest BIC values (25,26) indicate the most parsimonious model. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Appendix: Parameters for Latent Classes
For a latent class model with 3 categorical indicators (c1, c2 and c3) and 5 indicators (Y1, Y2, Y3, Y4, Y5), the posterior probability of class c under pattern of y1, y2, y3, y4 and y5 is where = P(y1|c)P(y2|c)P(y3|c)P(y4|c)P(y5|c) Therefore we need to calculate P(c) and P(y k |c) first. P(c) can be calculated from parameters of Model for Clusters as The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.